Lecture � CS182 Intelligent machines

Greg Detre

Monday, October 07, 2002

observable environment, restricted move-space, yet the games themselves are hard to play � will not be able to search the entire tree

Agenda

adversarial search

alpha-beta pruning to speed up search

evaluation functions to approximate search

improve heuristics

Antony: would you necessarily need to be able to see more moves ahead to beat Kasparov?

it depends on your pruning, and moreover, on your evaluation functions? so, assuming we can do more search than Kasparov, then he must be evaluating the board better

assume that pruning and evaluation can be pretty independent

www.brainsinbahrain.com

latest chess competition with the new grandmaster

he used his signature Berlin Wall formation to draw, because he knew that he was playing black and could (in theory) only draw

minimax

crucial assumption: that the opponent plays rationally

apparently, if your opponent plays sub-optimally and you assume they�re going to be rational, you can only do better than them (assuming you can search the whole tree)

alphabeta

alpha is the lower bound on the value of a max node

beta is the upper bound on the value of a min node

(i.e. the value of a min node is always going to be less than this � it can only get smaller)

used the alpha value to prune at the min node

so you use these to help prune the rest of your search

transposition tables

a transposition is a different permutation of a move sequence that results in the same position � just a hash table

tic-tac-toe

use a 2-ply search

one of our moves, and one of our opponent�s moves

a ply is just a level in a search tree

use symmetry � not enumerate all of the tree, e.g. the first level only has 3 moves

backgammon

Tesauro 1989 � best computer player in the world, computer learnt by playing copies of itself (rather than against an expert), uses a neural net

initially, Neurogammon used a NN, fed it expert advice � very tedious for the expert � 20,000 positions � only mediocre, could be beaten by its expert trainer

�foregoing tutelage of masters places us at a disdvantage but is also liberating � not hindered by biases or prejudices�

what it learned has led to a rewriting of the backgammon textbooks

similar to go, in that there are many board positions

search space is huge, because there are two dice (20 different moves, c. 20 different dice rolls, so 400 branching factor)

need randomness to perform search, and avoid pathologies

learn key concepts first

still 2-ply search

quiescence search

a fixed cut-off depth is not very robust, and can lead to problems (e.g. if there�s a great move one ply beyond your search)

horizon effect

appers from a limited depth of search � need a heuristic to evaluate very succesful moves that appear to be indefinitely postponed, but might just require a 20+ move �ladder�

iterative deepening

deep blue used iterative deepening � it had to move against the clock

when the clock runs out, we use the solution

you can�t trust alphabeta with any confidence unless the full breadth of the tree has been searched

2GHz PC � 200 million nodes/move, 3 minutes (approx 10⁶ nodes/sec)

minimax approx 35⁵ = c. 50million

alphabeta, get to approx 10 ply (expert level)

additional tricks, e.g. metareasoning (14 ply)

database and extension tuning, supercomputer, special hardware � DeepBlue does 126*10⁶ nodes/sec)

Niall ICQ discussion about Go

branching factor, holism

you have databases

most of all though, chess has a point-scoring algorithm that works really well

chess uses searches � lower branching (you can prune very effectively down to about 14 moves sometimes)

whereas it�s really hard to evaluate your own position in go

and also depends a great deal on what your opponent does(???)

go doesn�t use search � they�re based mainly around shapes and rules

less easy to specify your goal state(???) � Niall says that chess programs basically just look for checkmates, whereas you�re balancing so many things with Go when thinking about goal states � for starters, I don�t know how hard it is to figure out when the game has actually ended

Questions

interestingly, it looks from James Luke�s talk as though they depended heavily on expert tweaking, rather than machine learning techniques, right???

alpha-beta vs minimax???

feature evaluation function???

transposition table??? a lookup table to prevent looping???

presumably you could use statistical methods to assign general values to different chess pieces???

but you need a pretty expressive way to represent the states to do that given a really big search space (e.g. chess), rather than just a big lookup table of chess boards

e.g. you break the board down into features (higher-dimensional representation of the state)

these features need to be easy to compute

often need an expert to identify useful features

Parkes: maybe the problem in Go is that there are lots of different features that you could use

if you just add the features up linearly, that�s easiest � but it doesn�t allow for interactions between the features (unless you expand the number of features, i.e. to include the interaction that you�re looking for)

temporal difference???